Results of the ISMRM 2020 joint Reproducible Research & Quantitative MR study groups reproducibility challenge on phantom and human brain T1 mapping

*Mathieu Boudreau1,2, *Agah Karakuzu1, Julien Cohen-Adad1,3,4,5, Ecem Bozhurt6, Madeline Carr7,8, Marco Castellaro9, Luis Concha10, Mariya Doneva11, Seraina Dual12, Alex Ensworth13,14, Alexandru Foias1, Véronique Fortier15,16, Refaat E. Gabr17, Guillaume Gilbert18, Carri K. Glide-Hurst19, Matthew Grech-Sollars20,21, Siyuan Hu22, Oscar Jalnefjord23,24, Jorge Jovicich25, Kübra Keskin6, Peter Koken11, Anastasia Kolokotronis13,26, Simran Kukran27,28, Nam. G. Lee6, Ives R. Levesque13,29, Bochao Li6, Dan Ma22, Burkhard Mädler30, Nyasha Maforo31,32, Jamie Near33,34, Erick Pasaye35, Alonso Ramirez-Manzanares36, Ben Statton37,Christian Stehning30, Stefano Tambalo25, Ye Tian6, Chenyang Wang38, Kilian Weiss30, Niloufar Zakariaei39, Shuo Zhang30, Ziwei Zhao6, Nikola Stikov1,2,40

  • *Authors MB and AK contributed equally to this work

1NeuroPoly Lab, Polytechnique Montréal, Montreal, Quebec, Canada, 2Montreal Heart Institute, Montreal, Quebec, Canada, 3Unité de Neuroimagerie Fonctionnelle (UNF), Centre de recherche de l’Institut Universitaire de Gériatrie de Montréal (CRIUGM), Montreal, Quebec, Canada, 4Mila - Quebec AI Institute, Montreal, QC, Canada, 5Centre de recherche du CHU Sainte-Justine, Université de Montréal, Montreal, QC, Canada, 6Magnetic Resonance Engineering Laboratory (MREL), University of Southern California, Los Angeles, California, USA, 7Medical Physics, Ingham Institute for Applied Medical Research, Liverpool, Australia, 8Department of Medical Physics, Liverpool and Macarthur Cancer Therapy Centres, Liverpool, Australia, 9Department of Information Engineering, University of Padova, Padova, Italy, 10Institute of Neurobiology, Universidad Nacional Autónoma de México Campus Juriquilla, Querétaro, México, 11Philips Research Hamburg, Germany, 12Stanford University, Stanford, California, United States, 13Medical Physics Unit, McGill University, Montreal, Canada, 14University of British Columbia, Vancouver, Canada, 15Department of Medical Imaging, McGill University Health Centre, 16Department of Radiology, McGill University, Montreal, Canada, 17Department of Diagnostic and Interventional Imaging, University of Texas Health Science Center at Houston, McGovern Medical School, Houston, Texas, USA, 18MR Clinical Science, Philips Canada, Mississauga, Ontario, Canada, 19Department of Human Oncology, University of Wisconsin-Madison, Madison, Wisconsin, USA, 20Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK, 21Lysholm Department of Neuroradiology, National Hospital for Neurology and Neurosurgery, University College London Hospitals NHS Foundation Trust, London, UK, 22Department of Biomedical Engineering, Case Western Reserve University, Cleveland, Ohio, USA, 23Department of Medical Radiation Sciences, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden Department of Medical Physics, 24Biomedical Engineering, Sahlgrenska University Hospital, Gothenburg, Sweden, 25Center for Mind/Brain Sciences, University of Trento, Italy, 26Hopital Maisonneuve-Rosemont, Montreal, Canada, 27Bioengineering, Imperial College London, 28Radiotherapy and Imaging, Insitute of Cancer Research, 29Research Institute of the McGill University Health Centre, Montreal, Canada, 30Clinical Science, Philips Healthcare, Germany, 31Department of Radiological Sciences, University of California Los Angeles, Los Angeles, CA, USA, 32Physics and Biology in Medicine IDP, University of California Los Angeles, Los Angeles, CA, USA, 33Douglas Brain Imaging Centre, Montreal, Canada, 34Sunnybrook Research Institute, Toronto, Canada, 35National Laboratory for Magnetic Resonance Imaging, Institute of Neurobiology, Universidad Nacional Autónoma de México (UNAM), Juriquilla, Mexico, 36Computer Science Department, Centro de Investigación en Matemáticas, A.C., Guanajuato, México, 37Medical Research Council, London Institute of Medical Sciences, Imperial College London, London, United Kingdom, 38Department of Radiation Oncology - CNS Service, The University of Texas MD Anderson Cancer Center, 39Department of Biomedical Engineering, University of British Columbia, British Columbia, Canada, 40Center for Advanced Interdisciplinary Research, Ss. Cyril and Methodius University, Skopje, North Macedonia

Abstract#

Purpose: T1 mapping is a widely used quantitative MRI technique, but its tissue-specific values remain inconsistent across protocols, sites, and vendors. The ISMRM Reproducible Research study group (RRSG) and Quantitative MR study group (qMRSG) jointly launched a T1 mapping reproducibility challenge. The goal was to investigate whether an independently-implemented image acquisition protocol at multiple centers could reliably measure T1 using inversion recovery in a standardized phantom and in brains of healthy volunteers.

Methods: The reproducibility of a robust T1 mapping protocol and fitting algorithm from a well-known Magnetic Resonance in Medicine publication (Barral et al. 2010) were evaluated. Participants in the challenge were invited to acquire T1 mapping data on a standard ISMRM/NIST phantom and/or in healthy human brains. Data submission, pipeline development, and analysis were performed using open-source platforms for increased reproducibility and transparency. For participants acquiring data at multiple sites, inter-submission and intra-submission comparisons were conducted by selecting one dataset per submission.

Results: Eighteen submissions were accepted using data collected from MRIs from three manufacturers (Siemens, GE, Philips) and at 3T (except one submission, at 0.35T), resulting in a total of 38 datasets acquired in phantoms and 56 datasets in healthy human subjects. Inter-submission mean coefficient of variation (CoV) was 6.1% for phantom measurements, nearly twice as high as the evaluated intra-submission CoV (2.9%). A similar trend was observed in the human data (inter-/intra- submission CoV was 6.0/2.9 % in the genu and 16/6.9 % in the cortical gray matter).

Conclusion: The imaging protocol and fitting algorithm from Barral et al. resulted in good reproducibility of both phantom and human brain T1 measurements. Differences in implementations between submissions resulted in higher variance in reported values relative to intra-submission variability. This challenge generated a large open database of inversion recovery T1 mapping acquisitions across several sites and MRI vendors accessible at osf.io/ywc9g/. To allow for the wider community to access and engage with this dataset, we developed an interactive dashboard that is available at https://rrsg2020.dashboards.neurolibre.org.

Dashboard: Challenge Submissions

1     |     INTRODUCTION#

Significant challenges exist in the reproducibility of quantitative MRI (qMRI) (Keenan et al. 2019). Despite its promise of improving the specificity and reproducibility of MRI acquisitions, few qMRI techniques have been integrated into clinical practice. Even the most fundamental MR parameters cannot be measured with sufficient reproducibility and precision across clinical scanners to pass the second of six stages of technical assessment for clinical biomarkers (Fryback and Thornbury 1991; Schweitzer 2016; Seiberlich et al. 2020). Half a century has passed since the first quantitative T1 (spin-lattice relaxation time) measurements were first reported as a potential biomarker for tumors (Damadian 1971), followed shortly thereafter by the first in vivo quantitative T1 maps (Pykett and Mansfield 1978) of tumors, but there is still disagreement in reported values for this fundamental parameter across different sites, vendors, and implementations (Stikov et al. 2015).

Among fundamental MRI parameters, T1 holds significant importance (Boudreau, Keenan, and Stikov 2020). It represents the time it takes for the longitudinal magnetization to recover after being disturbed by an RF pulse. The T1 value varies based on molecular mobility and magnetic field strength (Bottomley et al. 1984; Wansapura et al. 1999; Dieringer et al. 2014), making it a valuable parameter for distinguishing different tissue types. Accurate knowledge of T1 values is essential for optimizing clinical MRI pulse sequences for contrast and time efficiency (Ernst and Anderson 1966; Redpath and Smith 1994; Tofts 1997) and as a calibration parameter for other quantitative MRI techniques (Sled and Pike 2001; Yuan et al. 2012). Among the number of techniques to measure T1, inversion recovery (IR) (Drain 1949; Hahn 1949) is widely held as the gold standard technique, as it is robust against other effects (e.g. B1 inhomogeneity) and potential errors in measurements (e.g. insufficient spoiling) (Stikov et al. 2015). However, because the technique requires a long repetition time (TR > T1), it is very slow and impractical for whole-organ measurements, limiting its clinical use. In practice, it is mostly used as a reference to validate other T1 mapping techniques, such as variable flip angle imaging (VFA) (Fram et al. 1987; Deoni, Rutt, and Peters 2003; Cheng and Wright 2006), Look-Locker (Look and Locker 1970; Messroghli et al. 2004; Piechnik et al. 2010), and MP2RAGE (Marques et al. 2010; Marques and Gruetter 2013).

Efforts have been made to develop quantitative MRI phantoms to assist in standardizing T1 mapping methods (Keenan et al. 2018). A quantitative MRI standard system phantom was created in a joint project between the International Society for Magnetic Resonance in Medicine (ISMRM) and the National Institute of Standards and Technology (NIST) (Stupic et al. 2021), and has since been commercialized (Premium System Phantom, CaliberMRI, Boulder, Colorado). The spherical phantom has a 57-element fiducial array containing spheres with doped liquids that model a wide range of T1, T2, and PD values. The reference values of each sphere were measured using NMR at 1.5T and 3T (Stupic et al. 2021). The standardized concentration for relaxometry values established as references by NIST are also used by another company for their relaxometry MRI phantoms (Gold Standard Phantoms Ltd., Rochester, England). The cardiac TIMES phantom (Captur et al. 2016) is another commercially available system phantom focusing on T1 and T2 values in blood and heart muscles, pre- and post-contrast. The ISMRM/NIST phantom has been used in several large multicenter studies already, for example in (Bane et al. 2018) where they compared measurements at eight sites on a single ISMRM/NIST phantom using the inversion recovery and VFA T1 mapping protocols recommended by NIST, as well as some site-specific imaging protocols used for dynamic contrast enhanced (DCE) imaging. Bane et al. (Bane et al. 2018) concluded that the acquisition protocol, field strength, and T1 value of the sample impacted the level of accuracy, repeatability, and interplatform reproducibility that was observed. In another study led by NIST researchers (Keenan et al. 2021), T1 measurements were done at two clinical field strengths (1.5T and 3.0 T) and 27 MRI systems (three vendors) using the recommended NIST protocols. That study, which only investigated phantoms, found no significant relationship between T1 discrepancies of the measurements and the MRI vendors used.

The 2020 ISMRM reproducibility challenge1 posed the following question: will an imaging protocol independently-implemented at multiple centers reliably measure what is considered one of the fundamental MR parameters (T1) using the most robust technique (inversion recovery) in a standardized phantom (ISMRM/NIST system phantom) and in the healthy human brain? More broadly, this challenge aimed at assessing the reproducibility of a qMRI method presented in a seminal paper, (Barral et al. 2010), by evaluating the variability in measurements observed by different research groups that implemented this imaging protocol. As the focus of this challenge was on reproducibility, the challenge design emphasized the use of reproducible research practices, such as sharing code, pipelines, data, and scripts to reproduce figures.

2     |     METHODS#

2.1     |     Materials#

The phantom portion of the challenge was launched for those with access to the International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) system phantom (Stupic et al. 2021) (Premium System Phantom, CaliberMRI, Boulder, Colorado). Two versions of the phantom have been produced with slightly different T1/T2/PD values in the liquid spheres. Phantoms with serial numbers 0042 or less are referred to as “Version 1”, and those 0043 or greater are “Version 2”. The phantom has three plates containing sets of 14 spheres for ranges of proton density (PD), T1 (NiCl2), and T2 (MnCl2) values. Reference T1 values at 20 °C and 3.0 T for the T1 plate are listed in Table 1 for both versions of the phantom. Participants were instructed to record the temperature before and after scanning the phantom using the phantom's internal thermometer. Instructions for positioning and setting up the phantom were provided to participants through the NIST website2.

Sphere #

Version 1 (ms)

Version 2 (ms)

1

1989 ± 1.0

1883.97 ± 30.32

2

1454 ± 2.5

1330.16 ± 20.41

3

984.1 ± 0.33

987.27 ± 14.22

4

706 ± 1.0

690.08 ± 10.12

5

496.7 ± 0.41

484.97 ± 7.06

6

351.5 ± 0.91

341.58 ± 4.97

7

247.13 ± 0.086

240.86 ± 3.51

8

175.3 ± 0.11

174.95 ± 2.48

9

125.9 ± 0.33

121.08 ± 1.75

10

89.0 ± 0.17

85.75 ± 1.24

11

62.7 ± 0.13

60.21 ± 0.87

12

44.53 ± 0.090

42.89 ± 0.44

13

30.84 ± 0.016

30.40 ± 0.62

14

21.719 ± 0.005

21.44 ± 0.31

TABLE 1 Reference3 T1 values of the “T1 plate” of the standard phantom (for both phantom versions) measured at 20 °C and 3.0 T. Phantoms with serial numbers 0042 or less are referred to as “Version 1”, and those 0043 or greater are “Version 2”.

Participants were also instructed to collect healthy human brain T1 maps, if possible, by following their institutional ethical guidelines and with participants' consent to participate in the challenge. To ensure consistency across datasets, single-slice positioning parallel to the AC-PC line was recommended. Before the scanning process, the participants granted their consent4 to share their de-identified data openly with the challenge organizers and on the website Open Science Framework (OSF.io). As the submitted single-slice inversion recovery images would be along the AC-PC line, they are unlikely to contain sufficient information for facial identification, and therefore participants were not instructed to de-face their data. The researchers who submitted human data for the challenge provided written confirmation to the organizers that their data was acquired in accordance with their institutional ethics committee (or equivalent regulatory body) and that the subjects had consented to sharing their data as described in the challenge.

2.2     |     Protocol#

Participants were instructed to acquire the T1 mapping data using the spin-echo inversion recovery protocol for T1 mapping as reported in (Barral et al. 2010), as detailed in Table 2. This protocol uses four inversion times optimized for human brain T1 values and uses a relatively short TR (2550 ms). It is important to note that this acquisition protocol is not suitable for T1 mapping fitting models that assume TR > 5T1. Instead, more general models of inversion recovery, such as the Barral et al. fitting model described in Section 2.4.1, can be used to fit this data.

Pulse Sequence

Spin-echo inversion recovery

Repetition Time (TR)

2550 ms

Inversion Time (TI)

50, 400, 1100, 2500 ms

Echo Time (TE)

14 ms

In-plane resolution

1x1 mm2

Slice thickness

2 mm

TABLE 2 Imaging protocol for inversion recovery T1 mapping proposed to the participants for the 2020 joint RRSG-qMRSG reproducibility challenge. The protocol is the brain imaging protocol used in (Barral et al. 2010), and which is meant for the T1 values observed in healthy human brains.

Participants were advised to adhere to this protocol as closely as possible, but to report any differences in protocol parameters due to technical limitations of their scanners and/or software. It was recommended that participants submit complex data (magnitude and phase, or real and imaginary), but magnitude-only data was also accepted if complex data could not be exported.

2.3     |     Data Submissions#

Data submissions for the challenge were managed through a dedicated repository on GitHub, accessible at https://github.com/rrsg2020/data_submission. This allowed transparent and open review of the submissions, as well as better standardization of the process. All datasets had to be converted to the NIfTI file format, and images from different TIs needed to be concatenated into the fourth (or “time”) dimension. Magnitude-only datasets required one NIfTI file, while complex datasets required two files (magnitude and phase, or real and imaginary). Additionally, a configuration file containing submission, dataset, and acquisition details (such as data type, submitter name and email, site details, phantom or volunteer details, and imaging protocol details) were required for each submitted dataset to ensure that the information was standardized and easily found. Each submission was reviewed to confirm that guidelines were followed, and then datasets and configuration files were uploaded to OSF.io. A Jupyter Notebook (Kluyver et al. 2016; Beg et al. 2021) pipeline was used to generate T1 maps at this stage, for the purposes of quality assurance. Links to the Jupyter Notebook for reproducing the T1 map were shared for each submission using the MyBinder platform, ensuring that computation environments were reproducible without the need for installation of software packages on local computers.

2.4     |     Data Processing#

2.4.1     |     Fitting Model and Pipeline#

A reduced-dimension non-linear least squares (RD-NLS) approach was used to fit the complex general inversion recovery signal equation:

(1)#\[S(TI) = a + be^{-TI/T_1}\]

where a and b are complex constants. This approach, introduced in (Barral et al. 2010), models the general T1 signal equation without the long-TR approximation. The a and b constants inherently factor TR in them. Barral et al. shared the implementation of the fitting algorithm used in their paper5. To facilitate its use in our pipelines, we used a wrapper around this code available in the open-source software qMRLab (Cabana et al. 2015; Karakuzu et al. 2020), which provides a standardized API to call the fitting in MATLAB/Octave scripts.

A Jupyter Notebook data processing pipeline was written using MATLAB/Octave. This pipeline automatically downloads all the datasets, loads each dataset configuration file, fits the T1 data voxel-wise, and exports the resulting T1 map to the NIfTI and PNG formats for quality assurance. This pipeline is available in a GitHub repository (https://github.com/rrsg2020/t1_fitting_pipeline, filename: RRSG_T1_fitting.ipynb). Once all submissions were collected and the pipeline was executed, the T1 maps were uploaded to OSF (https://osf.io/ywc9g/).

2.4.2     |     Image Labeling & Registration#

The T1 plate of the phantom had 14 spheres that were labeled as the regions-of-interest (ROI) using a numerical mask template created in MATLAB, provided by NIST researchers (Figure 1-a). To avoid potential edge effects in the T1 maps, the ROI labels were reduced to 60% of the expected sphere diameter. A registration pipeline in Python using the Advanced Normalization Tools (ANTs) (Avants, Tustison, and Song 2009) was developed and shared in the analysis repository of our GitHub organization (https://github.com/rrsg2020/analysis, filename: register_t1maps_nist.py). The ROI labels template was nonlinearly registered to each T1 map uploaded to OSF.io.

_images/fig1.png

FIGURE 1 ROI selection for the NIST phantom (a) and the human brain (b). a) The 14 ROIs (shades of blue/green) were automatically generated using a script provided by NIST. In yellow are the three reference pins in the phantom, i.e. these are not ROIs or spheres. b) ROIs were manually segmented in the human brains in four regions: the genu (yellow, 5x5 voxels), splenium (green, 5x5 voxels), deep gray matter (blue, 5x5 voxels), and cortical gray matter (red, three sets of 3x3 voxels). Note: due to differences in slice positioning from the single-slice datasets provided by certain sites, for some datasets it was not possible to manually segment an ROI in the genu or deep gray matter. In the case of the missing genu, left or right frontal white matter was selected; for deep grey matter, it was omitted entirely for those cases.

Manual ROIs were segmented using FSLeyes (McCarthy 2019) in four regions for the human datasets (Figure 1-b): genu, splenium, deep gray matter, and cortical gray matter. Automatic segmentation was not used because the data were single-slice and there was inconsistent slice positioning between datasets.

2.4.3     |     Analysis and Statistics#

Analysis code and scripts were developed and shared in a version-tracked public GitHub repository6. Python-based Jupyter Notebooks were used for both the quality assurance and main analysis workflows. The computational environment requirements were containerized in Docker (Merkel 2014; Boettiger 2015), allowing for an executable environment that can reproduce the analysis in a web browser through MyBinder7 (Project Jupyter et al. 2018). Backend Python files handled reference data, database handling, ROI masking, and general analysis tools, while configuration files managed the dataset information which were downloaded and pooled using a script (make_pooled_datasets.py). The databases were created using a reproducible Jupyter Notebook script and subsequently saved in the repository.

For the NIST phantom data, mean T1 values for each ROI were compared with temperature-corrected reference values and visualized in three different types of plots (linear axes, log-log axes, and error relative to the reference value). This comparison was repeated for individual measurements at each site and for all measurements grouped together. Temperature correction was carried out via interpolation of the set of reference NIST T1 values between 16 °C and 26 °C (2 °C intervals), listed in the phantom technical specifications. For the human datasets, a notebook was created to plot the mean and standard deviations for each tissue ROI from all submissions from all sites. All the quality assurance and analysis plot images were saved to the repository for ease-of-access and a timestamped version-controlled record of the state of the analysis figures. The database files of ROI values and acquisition details for all submissions were also saved to the repository.

An interactive dashboard8 was developed in Dash by Plotly (Plotly Technologies Inc. 2015) and hosted by NeuroLibre (Karakuzu et al. 2022) to enable real-time exploration of the data, analysis, and statistics of the challenge results. The dashboard reports descriptive statistics for a variety of alternative looks at phantom and brain data, as well as some statistical comparisons (e.g., the shift function). The data was collected from the pre-prepared databases of masked ROI values and incorporated other database information, such as phantom version, temperature, MRI manufacturer, and reference values. The interactive dashboard displays these results for all measurements at all sites.

3     |     RESULTS#

3.1     |     Dashboard#

To better disseminate the challenge results, a web-based dashboard was developed (Figure 2, https://rrsg2020.dashboards.neurolibre.org). The landing page (Figure 2-a) showcases the relationship between the phantom and brain datasets acquired at different sites/vendors. Navigating to the phantom section leads to the information about the submitted datasets, such as the mean/std/median/CoV for each sphere, % difference from the reference values, number of scans, and temperature (Figure 2-b, left). Other options allow users to limit the results by specific versions of the phantom or the MRI manufacturer. Selecting either “By Sphere” (Figure 2-b, right) or “By Site” tabs will display whisker plots for the selected options, enabling further exploration of the datasets.

Returning to the home page and selecting the brain section allows exploration of information on the brain datasets (Figure 2-c, left), such as mean T1 and STD for different ROI regions, as well as selection of specific MRI manufacturers. Choosing the “By Regions” tab provides whisker plots of the datasets for the selected ROI (Figure 2-c, right), similar to the plots for the phantom.

FIGURE 2 Dashboard. a) welcome page listing all the sites, the types of subject, and scanner, and the relationship between the three. b) dashboard tabs. Link: https://rrsg2020.dashboards.neurolibre.org

3.2     |     Submissions#

Eighteen submissions were included in the analysis, which resulted in 38 T1 maps of the NIST/system phantom, and 56 brain T1 maps. Figure 3 illustrates all the submissions that acquired phantom data (Figure 3-a) and human data (Figure 3-b), the number of scanners used for each submission, and the number of T1 mapping datasets. It should be noted that these numbers include a subset of measurements where both complex and magnitude-only data from the same acquisition were used to fit T1 maps, thus the total number of unique acquisitions is lower than the numbers reported above. The datasets were collected on three MRI manufacturers (Siemens, GE, Philips) and were acquired at 3.0 T, except for one dataset acquired at 350 mT. To showcase the heterogeneity of the actual T1 map data from the independently-implemented submissions, Figure 4 displays six T1 maps of the phantoms submitted to the challenge.

_images/figure_3_full.png

FIGURE 3 Complete list of the datasets submitted to the challenge. Submissions that included phantom data are shown in a), and those that included human brain data are shown in b). Submissions were assigned numbers to keep track of which submissions included both phantom and human data. Some submissions included datasets acquired on multiple scanners. For the phantom (panel a), each submission acquired all their data using a single phantom, however some submissions shared phantoms with each other (same color). Some additional details about the datasets are included in the T1 maps column, if relevant. Note that for complex datasets in the magnitude/phase format, T1 maps were calculated both using magnitude-only data and complex-data, but these were from the same measurement (branching off arrow).

Of these datasets, several submissions went beyond the minimum acquisition and acquired additional datasets using the NIST phantom, such as a traveling phantom (7 scanners/sites), scan-rescan, same-day rescans on two MRIs, short TR vs long TR, and 4 point TI vs 14 point TI. For humans, one site acquired 13 subjects on two scanners (different manufacturers), one site acquired 6 subjects, and one site acquired a subject using two different head coils (20 channels vs. 64 channels).

from os import path
import os

if path.isdir('analysis')== False:
    !git clone https://github.com/rrsg2020/analysis.git
    dir_name = 'analysis'
    analysis = os.listdir(dir_name)

    for item in analysis:
        if item.endswith(".ipynb"):
            os.remove(os.path.join(dir_name, item))
        if item.endswith(".md"):
            os.remove(os.path.join(dir_name, item))

# Imports

from pathlib import Path
import pandas as pd
import json
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import Video
import glob
from analysis.src.plots import *
from analysis.make_pooled_datasets import *

# Configurations
configFile = Path('analysis/configs/3T_NIST_T1maps.json')
data_folder_name = 'analysis/3T_NIST_T1maps'

configFile_raw = Path('analysis/configs/3T_NIST.json')
data_folder_name_raw = 'analysis/3T_NIST'

# Download datasets
if not Path(data_folder_name).exists():
    make_pooled_dataset(configFile, data_folder_name)

if not Path(data_folder_name_raw).exists():
    make_pooled_dataset(configFile_raw, data_folder_name_raw)

with open(configFile) as json_file:
    configJson = json.load(json_file)

with open(configFile_raw) as json_file:
    configJson_raw = json.load(json_file)
    
def get_image(dataset_name, key2):
    # Load T1 image data
    t1_file = configJson[dataset_name]['datasets'][key2]['imagePath']
    t1 = nib.load(Path(data_folder_name) / t1_file)
    t1_volume = t1.get_fdata() 

    # Load raw image data
    raw_file = configJson_raw[dataset_name]['datasets'][key2]['imagePath']
    raw = nib.load(Path(data_folder_name_raw) / raw_file)
    raw_volume = raw.get_fdata() 

    # Handle 2D vs 3D volume case
    dims = t1_volume.shape
    if (len(dims) == 2) or (np.min(dims) == 1):
        im = np.rot90(t1_volume)
        TI_1 = np.rot90(np.squeeze(raw_volume[:,:,0,0]))
        TI_2 = np.rot90(np.squeeze(raw_volume[:,:,0,1]))
        TI_3 = np.rot90(np.squeeze(raw_volume[:,:,0,2]))
        TI_4 = np.rot90(np.squeeze(raw_volume[:,:,0,3]))    
    else:
        index_smallest_dim = np.argmin(dims)
        numberOfSlices = dims[index_smallest_dim]
        midSlice = int(np.round(numberOfSlices/2))

        if index_smallest_dim == 0:
            im = np.rot90(np.squeeze(t1_volume[midSlice,:,:]))
            TI_1 = np.rot90(np.squeeze(raw_volume[midSlice,:,:,0]))
            TI_2 = np.rot90(np.squeeze(raw_volume[midSlice,:,:,1]))
            TI_3 = np.rot90(np.squeeze(raw_volume[midSlice,:,:,2]))
            TI_4 = np.rot90(np.squeeze(raw_volume[midSlice,:,:,3]))    
        elif index_smallest_dim == 1:
            im = np.rot90(np.squeeze(t1_volume[:,midSlice,:]))
            TI_1 = np.rot90(np.squeeze(raw_volume[:,midSlice,:,0]))
            TI_2 = np.rot90(np.squeeze(raw_volume[:,midSlice,:,1]))
            TI_3 = np.rot90(np.squeeze(raw_volume[:,midSlice,:,2]))
            TI_4 = np.rot90(np.squeeze(raw_volume[:,midSlice,:,3]))
        elif index_smallest_dim == 2:
            im = np.rot90(np.squeeze(t1_volume[:,:,midSlice]))
            TI_1 = np.rot90(np.squeeze(raw_volume[:,:,midSlice,0]))
            TI_2 = np.rot90(np.squeeze(raw_volume[:,:,midSlice,1]))
            TI_3 = np.rot90(np.squeeze(raw_volume[:,:,midSlice,2]))
            TI_4 = np.rot90(np.squeeze(raw_volume[:,:,midSlice,3]))

    xAxis = np.linspace(0,im.shape[0]-1, num=im.shape[0])
    yAxis = np.linspace(0,im.shape[1]-1, num=im.shape[1])
    return im, xAxis, yAxis, TI_1, TI_2, TI_3, TI_4


im_1, xAxis_1, yAxis_1, TI_1_1, TI_2_1, TI_3_1, TI_4_1 = get_image('wang_MDAnderson_NIST', 'day2_mag')

im_2, xAxis_2, yAxis_2, TI_1_2, TI_2_2, TI_3_2, TI_4_2 = get_image('CStehningPhilipsClinicalScienceGermany_NIST', 'Bonn_MR1_magnitude')

im_3, xAxis_3, yAxis_3, TI_1_3, TI_2_3, TI_3_3, TI_4_3 = get_image('mrel_usc_NIST', 'Session1_MR1')

im_4, xAxis_4, yAxis_4, TI_1_4, TI_2_4, TI_3_4, TI_4_4 = get_image('karakuzu_polymtl_NIST', 'mni')

im_5, xAxis_5, yAxis_5, TI_1_5, TI_2_5, TI_3_5, TI_4_5 = get_image('madelinecarr_lha_NIST', 'one')

im_6, xAxis_6, yAxis_6, TI_1_6, TI_2_6, TI_3_6, TI_4_6 = get_image('matthewgrechsollars_ICL_NIST', 'magnitude')
im_6 = np.flipud(im_6)
TI_1_6 = np.flipud(TI_1_6)
TI_2_6 = np.flipud(TI_2_6)
TI_3_6 = np.flipud(TI_3_6)
TI_4_6 = np.flipud(TI_4_6)

FIGURE 4 Example T1 maps that were submitted. Note the differences in acquisitions (e.g. FOV (top middle), orientation (bottom right, k-space pattern (top left and right) and resulting artifacts in the T1 maps (e.g. ghosting (bottom left), ringing (bottom middle), noise profiles (top left and bottom right), deformation/slice mispositioning (top right)) resulting from the independently-implemented acquisition protocols.

3.2     |     Phantom#

An overview of the T1 results for the submitted NIST phantom datasets are displayed in Figure 5. The same data is presented in each column with different axes types (linear, log, and error) to better visualize the results. The left column (a,d) shows the mean T1 with their standard deviations in each of the 14 ROIs are plotted against temperature-corrected reference T1 values using linear axes for a representative dataset (a) and all datasets (d). The middle column (b,e) displays the same mean T1 datasets as (a,d) but using log-log axes. The right column (c,f) displays the error (%) of the measured T1 relative to the temperature-corrected NIST reference values; the dotted lines represent a ±10% error. Figure 5-a shows good agreement (slope = 0.98, intercept = -14 ms) for this dataset in comparison to the reference T1 values. However, this trend breaks down for low T1 values (T1 < 100-200 ms), as seen in the log-log plot (Figure 5-b), which was expected because the imaging protocol is optimized for human T1 values (T1 > 500 ms). Errors exceeding 10% are observed for T1 values of phantom spheres below this threshold (Figure 5-c). These trends are observed for the entire-dataset plots as well (Figure 5 d-f). More variability is seen in Figure 5-d around the identity diagonal at very high T1 (T1 ~ 2000 ms) than towards the WM-GM values (T1 ~ 600-1400 ms), which is less apparent in the log-log plot (Figure 5-e). In addition to the low T1 values exceeding the 10% error threshold (Figure 5-f), a few measurements with outlier values (~3-4) were observed in the human tissue range.

from os import path
import os

if path.isdir('analysis')== False:
    !git clone https://github.com/rrsg2020/analysis.git
    dir_name = 'analysis'
    analysis = os.listdir(dir_name)

    for item in analysis:
        if item.endswith(".ipynb"):
            os.remove(os.path.join(dir_name, item))
        if item.endswith(".md"):
            os.remove(os.path.join(dir_name, item))

# Imports
import warnings
warnings.filterwarnings("ignore")

from pathlib import Path
import pandas as pd
import json
import nibabel as nib
import numpy as np

from analysis.src.database import *
from analysis.src.nist import get_reference_NIST_values, get_NIST_ids
from analysis.src.tools import calc_error
from analysis.src.nist import temperature_correction

import matplotlib.pyplot as plt
plt.style.use('analysis/custom_matplotlibrc')
plt.rcParams["figure.figsize"] = (10,10)
fig_id = 0

database_path = Path('analysis/databases/3T_NIST_T1maps_database.pkl')
output_folder = Path("analysis/plots/03_singledataset_scatter_NIST-temperature-corrected/")

estimate_type = 'mean' # median or mean

## Define Functions
def plot_single_scatter(x, y, y_std,
                        title, x_label, y_label,
                        file_prefix, folder_path, fig_id,
                        y_type='linear'):
    if y_type is 'linear':
        plt.errorbar(x,y, y_std, fmt='o', solid_capstyle='projecting')
        ax = plt.gca()
        ax.axline((1, 1), slope=1, linestyle='dashed')
        ax.set_ylim(ymin=0, ymax=2500)
        ax.set_xlim(xmin=0, xmax=2500)
    if y_type is 'log':
        plt.loglog(x,y,'o')
        ax = plt.gca()
        ax.set_ylim(ymin=20, ymax=2500)
        ax.set_xlim(xmin=20, xmax=2500)
    if y_type is 'error_t1':
        plt.errorbar(x,calc_error(x,y), fmt='o')
        ax = plt.gca()
        ax.axline((1, 0), slope=0, color='k')
        ax.axline((1, -10), slope=0, linestyle='dashed', color='k')
        ax.axline((1, 10), slope=0, linestyle='dashed', color='k')
        ax.set_ylim(ymin=-100, ymax=100)
        ax.set_xlim(xmin=0, xmax=2500)


    
    plt.title(title)
    plt.xlabel(x_label)
    plt.ylabel(y_label)

    fig = plt.gcf()
    

    folder_path.mkdir(parents=True, exist_ok=True)

    if fig_id<10:
        filename = "0" + str(fig_id) + "_" + file_prefix
    else:
        filename = str(fig_id) + "_" + file_prefix

    fig.savefig(folder_path / (str(filename) + '.svg'), facecolor='white')
    fig.savefig(folder_path / (str(filename) + '.png'), facecolor='white')
    fig_id = fig_id + 1
    plt.show()
    return fig_id

## Load database

df = pd.read_pickle(database_path)

## Initialize array

dataset_estimate = np.array([])
dataset_std = np.array([])

index = 6.001

serial_number = df.loc[index]['phantom serial number']


for key in get_NIST_ids():
    if estimate_type == 'mean':
        dataset_estimate = np.append(dataset_estimate, np.mean(df.loc[index][key]))
    elif estimate_type == 'median':
        dataset_estimate = np.append(dataset_estimate, np.median(df.loc[index][key]))
    else:
        Exception('Unsupported dataset estimate type.')

    dataset_std = np.append(dataset_std, np.std(df.loc[index][key]))

ref_values = get_reference_NIST_values(serial_number)

temperature = df.loc[index]['phantom temperature']
temp_corrected_ref_values = temperature_correction(temperature, serial_number)


# PYTHON CODE
# Module imports

import matplotlib.pyplot as plt
from PIL import Image
from matplotlib.image import imread
import scipy.io
import plotly.graph_objs as go
import numpy as np
from plotly import __version__
from plotly.offline import init_notebook_mode, iplot, plot
config={'showLink': False, 'displayModeBar': False}

init_notebook_mode(connected=True)

from IPython.display import display, HTML

import os
import markdown
import random
from scipy.integrate import quad

import warnings
warnings.filterwarnings('ignore')
config={'showLink': False, 'displayModeBar': False}

lin_line = go.Scatter(
    x=np.linspace(0,2500),
    y=np.linspace(0,2500),
    name="slope",
    line_shape='linear',
    line={'dash': 'dash','color': 'black'},
    visible = True,
    showlegend = False
)

data_lin=go.Scatter(
    x=temp_corrected_ref_values,
    y=dataset_estimate,
    error_y=dict(
        type='data', # value of error bar given in data coordinates
        array=dataset_std,
        visible=True),
    name = 'id: '+ str(index),
    mode = 'markers',
    marker=dict(color='#007ea7'),
    visible = True,
    showlegend = False,
    )

data_log=go.Scatter(
    x=temp_corrected_ref_values,
    y=dataset_estimate,
    name = 'id: '+ str(index),
    mode = 'markers',
    marker=dict(color='#007ea7'),
    visible = True,
    showlegend = False
    )
data_error=go.Scatter(
    x=temp_corrected_ref_values,
    y= calc_error(temp_corrected_ref_values,dataset_estimate),
    name = 'id: '+ str(index),
    mode = 'markers',
    marker=dict(color='#007ea7'),
    visible = True,
    showlegend = False
    )
err_solid_line = go.Scatter(
    x=np.linspace(0,2500),
    y=np.linspace(0,2500)*0,
    name="0 % error line",
    line_shape='linear',
    line={'dash': 'solid','color': 'black'},
    visible = True,
    showlegend = False
)
err_p10_line = go.Scatter(
    x=np.linspace(0,2500),
    y=np.linspace(0,2500)*0+10,
    name="+10% error",
    line_shape='linear',
    line={'dash': 'dash','color': 'black'},
    visible = True,
    showlegend = False
)
err_m10_line = go.Scatter(
    x=np.linspace(0,2500),
    y=np.linspace(0,2500)*0-10,
    name="-10% error",
    line_shape='linear',
    line={'dash': 'dash','color': 'black'},
    visible = True,
    showlegend = False
)

data = [lin_line, data_lin, data_log, data_error, err_solid_line, err_p10_line, err_m10_line]

from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=2, cols=3, horizontal_spacing = 0.08)

fig.add_trace(
    lin_line,
    row=1, col=1
)

fig.add_trace(
    data_lin,
    row=1, col=1
)

fig.add_trace(
    data_log,
    row=1, col=2
)

fig.add_trace(
    data_error,
    row=1, col=3
)

fig.add_trace(
    err_solid_line,
    row=1, col=3
)

fig.add_trace(
    err_p10_line,
    row=1, col=3
)

fig.add_trace(
    err_m10_line,
    row=1, col=3
)

output_folder = Path("analysis/plots/04_alldatasets_scatter_NIST-temperature-corrected/")

## Initialize array

dataset_mean = np.zeros((1,14))
dataset_std = np.zeros((1,14))
version = np.array([])
temperature = np.array([])
ref_values = np.zeros((1,14))


ii=0
for index, row in df.iterrows():
    if type(df.loc[index]['T1 - NIST sphere 1']) is np.ndarray:

        version = np.append(version,df.loc[index]['phantom serial number'])
        temperature = np.append(temperature, df.loc[index]['phantom temperature'])


        if version[ii] is None:
            version[ii] = 999 # Missing version, only known case is one where we have version > 42 right now.
        
        if temperature[ii] is None:
            temperature[ii] = 20 # Missing temperature, assume it to be 20C (reference temperature).
            
            
        if ii==0:
            ref_values = get_reference_NIST_values(version[ii])
            temp_corrected_ref_values = temperature_correction(temperature[ii], version[ii])
        else:
            ref_values = np.vstack((ref_values, get_reference_NIST_values(version[ii])))
            temp_corrected_ref_values = np.vstack((temp_corrected_ref_values, temperature_correction(temperature[ii], version[ii])))
        
        tmp_dataset_estimate = np.array([])
        tmp_dataset_std = np.array([])

        for key in get_NIST_ids():
            if estimate_type is 'mean':
                tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.mean(df.loc[index][key]))
            elif estimate_type is 'median':
                tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.median(df.loc[index][key]))
            else:
                Exception('Unsupported dataset estimate type.')

            tmp_dataset_std = np.append(tmp_dataset_std, np.std(df.loc[index][key]))

        if ii==0:
            dataset_estimate = tmp_dataset_estimate  
            dataset_std = tmp_dataset_std
        else:
            dataset_estimate = np.vstack((dataset_estimate, tmp_dataset_estimate))
            dataset_std = np.vstack((dataset_std, tmp_dataset_std))

        ii=ii+1

## Setup for plots
fig_id = 0
dims=ref_values.shape
file_prefix = 'alldatasets'

fig.add_trace(
    lin_line,
    row=2, col=1
)

for ii in range(dims[0]):
    data_lin=go.Scatter(
        x=temp_corrected_ref_values[ii,:],
        y=dataset_estimate[ii,:],
        error_y=dict(
            type='data', # value of error bar given in data coordinates
            array=dataset_std[ii,:],
            visible=True),
        name = 'id: '+ str(df.index[ii]),
        mode = 'markers',
        visible = True,
        showlegend = False,
        )

    fig.add_trace(
        data_lin,
        row=2, col=1
    )


for ii in range(dims[0]):
    data_log=go.Scatter(
        x=temp_corrected_ref_values[ii,:],
        y=dataset_estimate[ii,:],
        name = 'id: '+ str(df.index[ii]),
        mode = 'markers',
        visible = True,
        showlegend = False
        )

    fig.add_trace(
        data_log,
        row=2, col=2
    )

for ii in range(dims[0]):
    data_error=go.Scatter(
        x=temp_corrected_ref_values[ii,:],
        y= calc_error(temp_corrected_ref_values[ii,:],dataset_estimate[ii,:]),
        name = 'id: '+ str(df.index[ii]),
        mode = 'markers',
        visible = True,
        showlegend = False
        )

    fig.add_trace(
        data_error,
        row=2, col=3
    )

fig.add_trace(
    err_solid_line,
    row=2, col=3
)

fig.add_trace(
    err_p10_line,
    row=2, col=3
)

fig.add_trace(
    err_m10_line,
    row=2, col=3
)

fig.update_xaxes(
    type="linear",
    range=[0,2500],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = 0,
    dtick = 500,
    linecolor='black',
    linewidth=2,
    row=1, col=1
    )
fig.update_yaxes(
    type="linear",
    range=[0,2500],
    title={
        'text':'T<sub>1</sub> estimate (ms)',
        'standoff':0
        },
    showgrid=True,
    tick0 = 0,
    dtick = 500,
    gridcolor='rgb(169,169,169)',
    linecolor='black',
    linewidth=2,
    row=1, col=1
    )

fig.update_xaxes(
    type="log",
    range=[np.log10(20),np.log10(2500)],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    minor=dict(ticks="inside", ticklen=6, showgrid=True),
    linecolor='black',
    linewidth=2,
    row=1, col=2
    )
fig.update_yaxes(
    type="log",
    range=[np.log10(20),np.log10(2500)],
    title={
        'text':'T<sub>1</sub> estimate (ms)',
        'standoff':0
        },
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    minor=dict(ticks="inside", ticklen=6, showgrid=True),
    linecolor='black',
    linewidth=2,
    row=1, col=2)

fig.update_xaxes(
    type="linear",
    range=[0,2500],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = 0,
    dtick = 500,
    linecolor='black',
    linewidth=2,
    row=1, col=3)

fig.update_yaxes(
    type="linear",
    range=[-100,100],
    title={
        'text':'Error (%)',
        'standoff':0
        },
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = -100,
    dtick = 25,
    linecolor='black',
    linewidth=2,
    row=1, col=3
    )


fig.update_xaxes(
    type="linear",
    range=[0,2500],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = 0,
    dtick = 500,
    linecolor='black',
    linewidth=2,
    row=2, col=1
    )
fig.update_yaxes(
    type="linear",
    range=[0,2500],
    title={
        'text':'T<sub>1</sub> estimate (ms)',
        'standoff':0
        },
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = 0,
    dtick = 500,
    linecolor='black',
    linewidth=2,
    row=2, col=1
    )

fig.update_xaxes(
    type="log",
    range=[np.log10(20),np.log10(2500)],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    minor=dict(ticks="inside", ticklen=6, showgrid=True),
    linecolor='black',
    linewidth=2,
    row=2, col=2
    )
fig.update_yaxes(
    type="log",
    range=[np.log10(20),np.log10(2500)],
    title={
        'text':'T<sub>1</sub> estimate (ms)',
        'standoff':0
        },
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    minor=dict(ticks="inside", ticklen=6, showgrid=True),
    linecolor='black',
    linewidth=2,
    row=2, col=2)

fig.update_xaxes(
    type="linear",
    range=[0,2500],
    title='Reference T<sub>1</sub> (ms)',
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = 0,
    dtick = 500,
    linecolor='black',
    linewidth=2,
    row=2, col=3)

fig.update_yaxes(
    type="linear",
    range=[-100,100],
    title={
        'text':'Error (%)',
        'standoff':0
        },
    showgrid=True,
    gridcolor='rgb(169,169,169)',
    tick0 = -100,
    dtick = 25,
    linecolor='black',
    linewidth=2,
    row=2, col=3,
    )

fig.update_layout(
    margin=dict(l=30, r=30, t=10, b=30),
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(255, 255, 255)',
    legend_title="",
    annotations=[
        dict(
            x=-0.05,
            y=0.53,
            showarrow=False,
            text='<b>a</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        dict(
            x=0.32,
            y=0.53,
            showarrow=False,
            text='<b>b</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        dict(
            x=0.7,
            y=0.53,
            showarrow=False,
            text='<b>c</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        dict(
            x=-0.05,
            y=-0.11,
            showarrow=False,
            text='<b>d</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        dict(
            x=0.32,
            y=-0.11,
            showarrow=False,
            text='<b>e</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        dict(
            x=0.7,
            y=-0.11,
            showarrow=False,
            text='<b>f</b>',
            font=dict(
                family='Times New Roman',
                size=48
            ),
            xref='paper',
            yref='paper'
        ),
        ]
    )

fig.update_layout(height=600, width=960)

#iplot(fig, filename = 'figure3', config = config)
plot(fig, filename = 'figure3.html', config = config)
display(HTML('figure3.html'))

FIGURE 5 Measured mean T1 values vs. temperature-corrected NIST reference values of the phantom spheres presented as linear plots (a,d), log-log plots (b,e), and plots of the error relative to reference T1 value. Plots (a–c) are of an example single dataset, whereas plots (d–f) are of all acquired datasets.

Inter-participant coefficient of variations (CoV) were calculated by selecting one single T1 map submitted per challenge participant9 and calculating the CoV of the T1 means per sphere. The average inter-participant CoV across the first five spheres representing the expected range in the human brain was 6.1 % (sphere 1 = 4.7 %, sphere 2 = 3.1 %, sphere 3 = 6.3 %, sphere 4 = 12.8 %, sphere 5 = 7.3 %). Two sites were clear outliers that had particular issues for sphere 4, likely due to a combination of an implementation error and a resulting uncertainty of where the signal null lies for a particular T1 value; by removing these outliers, the mean inter-participant CoV reduces to 4.1 % (sphere 1 = 5.4 %, sphere 2 = 3. 5%, sphere 3 = 2.5 %, sphere 4 = 4.2 %, sphere 5 = 4.9 %). One participant measured T1 maps with one phantom using one implemented protocol at 7 different sites using a single manufacturer, and so a mean intra-participant CoV across the first five spheres for this case was calculated to be 2.9 % (sphere 1 = 4.9 %, sphere 2 = 3.5 %, sphere 3 = 2.6 %, sphere 4 = 2.0 %, sphere 5 = 1.6 %).

FIGURE 6 Scatter plot comparing complex and magnitude-only fitted data. The markers are color-coded based on the implementation site, while their size represents the percent difference calculated between the two fitting methods (magnitude or complex) for that datapoint.

Figure 6 compared the mean T1 values measured using complex and magnitude-only data for the 11 datasets where authors provided both in their submissions. Note that these datasets are from the same acquisition, not two unique acquisitions. The scatter plot shows that for the range of T1 values expected in the brain (T1 > 500 ms), there is almost no difference in fitted T1 values between the two types of data (the highest outlier indicates ~9ms difference). However, for T1 values less than ~250 ms, there are large errors (please see the dashboard), which are likely due to poor fitting using a protocol that is not optimized for this range of T1 values.

FIGURE 7 This figure presents a hierarchical analysis of T estimation error across voxel-wise distributions in spheres 1-6, using 20 measurements split into 9 quantiles (q1-q9). Each panel shows individual shift functions for each measurement (colored by vendor) in the top row, which characterize the percent measurement error as either overestimation or underestimation. The bottom row in each panel (gray markers) displays the average trend of bootstrapped differences at each decile in milliseconds. The trends highlight any notable common patterns at the respective decile, such as a 39ms median underestimation trend in Sphere 3. Straight lines in the top row indicate a homogeneous measurement error across voxels. High-density intervals not intersecting the zero crossing indicate a significant trend at the respective decile.

​ The direction of the measurement error in the phantom is influenced by both the measurement site and the reference value, as indicated by the individual shift functions (Figure 7). For example, at sphere 1 (~2000 ms), nearly half of the measurements (20 shown in total) are positioned on each side of the zero-crossing. On the other hand, for sphere 3 (~1s), nearly all the measurements show underestimation as shift functions are located below the zero-crossing. Bootstrapped differences capture these trends, indicating a dominant overestimation at sphere 1 (median difference of +17ms) and underestimation at sphere 3 (median difference of -39ms). High-density intervals associated with these median differences do not indicate a common pattern for the former (intervals cross zero), whereas they reveal a notable underestimation trend at sphere 3 (intervals do not include zero). A similar common pattern is also observed for sphere 2 (median overestimation of 35ms). In addition, the shape of individual shift functions conveys information about how voxel-wise distributions differ. For example, curved lines in sphere 2 from two different sites reveal that some of the (ROI selected) voxels show drastically higher underestimation that cannot be captured by comparisons of central tendency alone. Lastly, the spread of shift functions around the zero-crossing does not indicate vendor-specific clustering for the selected measurements and reference values.

3.3     |     Human#

Figure 8 summarizes the results from human datasets submitted to this challenge, showing mean and standard deviation T1 values from the WM (genu) and GM (cerebral cortex) ROIs. The top plot collapses all datasets for each site, while the bottom plot shows each dataset separately. Mean WM T1 values across all submissions was 828 ± 38 ms in the genu and 854 ± 50 ms in the splenium, and mean GM T1 values were 1548 ± 156 ms in the cortex and 1188 ± 133 ms in the deep GM, with less variations overall in WM compared to GM, possibly due to better ROI placement and less partial voluming in WM. Inter-participant coefficients of variation (CoV) for independently-implemented imaging protocols were calculated using one T1 map measurement per submission that most closely matched the proposed protocol, and were 6.0% for genu, 11% for splenium, 16% for cortical GM and 22% for deep GM. One site (site 9) measured multiple subjects on three scanners using two different vendors, and so intra-participant CoVs for these centrally-implemented protocols were calculated over acquired T1 maps from this site, and were 2.9% for genu, 3.5% for splenium, 6.9 % for cortical GM and 7.8% for deep GM. It’s important to note that this site also had the best slice positioning, cutting through the AC-PC line and genu for proper ROI placement, particularly for the corpus callosum and deep GM.

from os import path
import os

if path.isdir('analysis')== False:
    !git clone https://github.com/rrsg2020/analysis.git
    dir_name = 'analysis'
    analysis = os.listdir(dir_name)

    for item in analysis:
        if item.endswith(".ipynb"):
            os.remove(os.path.join(dir_name, item))
        if item.endswith(".md"):
            os.remove(os.path.join(dir_name, item))

# Imports
import warnings
warnings.filterwarnings("ignore")

from pathlib import Path
import pandas as pd
import nibabel as nib
import numpy as np

from analysis.src.database import *
import matplotlib.pyplot as plt
plt.style.use('analysis/custom_matplotlibrc')
plt.rcParams["figure.figsize"] = (20,5)
fig_id = 0

# Configurations

database_path = Path('analysis/databases/3T_human_T1maps_database.pkl')
output_folder = Path("analysis/plots/08_wholedataset_scatter_Human/")

estimate_type = 'mean' # median or mean

# Define functions

def plot_both_scatter(x1, x2, y, y_std,
                      title, x1_label, x2_label, y_label,
                      file_prefix, folder_path, fig_id):
    
    plt.rcParams["figure.figsize"] = (20,10)

    fig, axs = plt.subplots(2)
    fig.suptitle(title)
    axs[0].errorbar(x1, y, y_std, fmt='o', solid_capstyle='projecting')
    axs[0].set_xlabel(x1_label)
    axs[0].set_ylabel(y_label)
    axs[0].set_xticks(np.arange(0, np.max(x1), step=1))


    axs[1].errorbar(x2, y, y_std, fmt='o', solid_capstyle='projecting')
    axs[1].set_xlabel(x2_label)
    axs[1].set_ylabel(y_label)
    axs[1].set_xticklabels(labels=x2, rotation=90)


    if fig_id<10:
        filename = "0" + str(fig_id) + "_" + file_prefix
    else:
        filename = str(fig_id) + "_" + file_prefix

    fig.savefig(folder_path / (str(filename) + '.svg'), facecolor='white')
    fig.savefig(folder_path / (str(filename) + '.png'), facecolor='white')
    fig_id = fig_id + 1
    plt.show()
    return fig_id

# Load database

df = pd.read_pickle(database_path)

genu_estimate = np.array([])
genu_std = np.array([])
splenium_estimate = np.array([])
splenium_std = np.array([])
deepgm_estimate = np.array([])
deepgm_std = np.array([])
cgm_estimate = np.array([])
cgm_std = np.array([])

ii = 0
for index, row in df.iterrows():
    
    if estimate_type is 'mean':
        genu_estimate = np.append(genu_estimate, np.mean(df.loc[index]['T1 - genu (WM)']))
        genu_std = np.append(genu_std, np.std(df.loc[index]['T1 - genu (WM)']))
        splenium_estimate = np.append(splenium_estimate, np.mean(df.loc[index]['T1 - splenium (WM)']))
        splenium_std = np.append(splenium_std, np.std(df.loc[index]['T1 - splenium (WM)']))
        deepgm_estimate = np.append(deepgm_estimate, np.mean(df.loc[index]['T1 - deep GM']))
        deepgm_std = np.append(deepgm_std, np.std(df.loc[index]['T1 - deep GM']))
        cgm_estimate = np.append(cgm_estimate, np.mean(df.loc[index]['T1 - cortical GM']))
        cgm_std = np.append(cgm_std, np.std(df.loc[index]['T1 - cortical GM']))
    elif estimate_type is 'median':
        genu_estimate = np.append(genu_estimate, np.median(df.loc[index]['T1 - genu (WM)']))
        genu_std = np.append(genu_std, np.std(df.loc[index]['T1 - genu (WM)']))
        splenium_estimate = np.append(splenium_estimate, np.median(df.loc[index]['T1 - splenium (WM)']))
        splenium_std = np.append(splenium_std, np.std(df.loc[index]['T1 - splenium (WM)']))
        deepgm_estimate = np.append(deepgm_estimate, np.median(df.loc[index]['T1 - deep GM']))
        deepgm_std = np.append(deepgm_std, np.std(df.loc[index]['T1 - deep GM']))
        cgm_estimate = np.append(cgm_estimate, np.median(df.loc[index]['T1 - cortical GM']))
        cgm_std = np.append(cgm_std, np.std(df.loc[index]['T1 - cortical GM']))
    else:
        Exception('Unsupported dataset estimate type.')
    ii = ii +1

# Store the IDs
indexes_numbers = df.index
indexes_strings = indexes_numbers.map(str)

x1_label='Site #'
x2_label='Site #.Meas #'
y_label="T$_1$ (ms)"
file_prefix = 'WM_and_GM'
folder_path=output_folder

x1=indexes_numbers
x2=indexes_strings
y=genu_estimate
y_std=genu_std

# Paper formatting of x tick labels (remove leading zero, pad zero at the end for multiples of 10)
x3=[]
for num in x2:
    x3.append(num.replace('.0', '.'))

index=0
for num in x3:
    if num[-3] != '.':
        x3[index]=num+'0'
    index+=1

FIGURE 8 Mean T1 values in two sets of ROIs, white matter (one 5x5 voxel ROI, genu) and gray matter (three 3x3 voxel ROIs, cortex). Top figure shows all datasets collapsed into sites, whereas the bottom shows each individual dataset.

4     |     DISCUSSION#

4.1     |     Achievements of the challenge#

The challenge focused on exploring the reproducibility of the gold standard inversion recovery T1 mapping method reported in a seminal paper (Barral et al. 2010). Eighteen submissions independently implemented the inversion recovery T1 mapping acquisition protocol as outlined in (Barral et al. 2010) (which is optimized for the T1 values observed in brain tissue), and measured T1 mapping data in a standard quantitative MRI phantom and/or human brains at 27 MRI sites, using three different vendors (GE, Philips, Siemens). The collaborative effort produced an open-source database of 94 T1 mapping datasets, including 38 ISMRM/NIST phantom and 56 human brain datasets. A standardized T1 processing pipeline was developed for different dataset types, including magnitude-only and complex data. Additionally, Jupyter notebooks that can be executed in containerized environments were developed for quality assurance visualization and analyses. An interactive web-based dashboard was also developed to allow for easy exploration of the challenge results in a web-browser.

The paper, (Barral et al. 2010), also shared the necessary code to robustly process inversion recovery T1 mapping data, and is particularly effective even when using a shorter TR (TR ~ T1). To evaluate the accuracy of the resulting T1 values, the challenge used the standard ISMRM/NIST phantom with fiducial spheres having T1 values in the range of human brain tissue, from 500 to 2000 ms (see Figure 5). As anticipated for this protocol, there was a decrease in the accuracy in measurements for spheres with T1 below 300 ms. Overall, the majority of the independently implemented imaging protocols from various sites resulted in good T1 agreement, with only a few exceptions. Using the NIST phantom, we report that sites that independently implemented the imaging protocol resulted in an inter-participant mean CoV (6.1 %) that was twice as high as the intra-participant mean CoV measured at seven sites (2.9 %). A similar trend was observed in vivo. Inter-participant CoV for WM (genu) was 6.0 % and GM (cortex) was 16.5 % vs the intra-participant CoV that was 2.9 % and 6.9%, with generally higher CoVs relative to the phantom measurements likely due to biological variability.

4.2     |     Comparison with other studies#

The work done during this challenge involved a multi-center quantitative T1 mapping study using the NIST phantom across various sites. This work overlaps with two recent studies (Bane et al. 2018; Keenan et al. 2021). (Bane et al. 2018) focused on the reproducibility of two standard quantitative T1 techniques (inversion recovery and variable flip angle) and a wide variety of site-specific T1 mapping protocols for DCE, mostly shorter VFA protocols, which were implemented at eight imaging centers covering the same 3 MRI vendors as were submitted to the challenge GE/Philips/Siemens). The inter-platform coefficient of variation for the standard inversion recovery T1 protocol was 5.46% at 3.0 T in (Bane et al. 2018), which was substantially lower than what they observed for the standard VFA protocol (22.87%). However, Bane et al.’s work differed from the challenge in several ways. First, the standard imaging protocol for inversion recovery used by Bane et al. had more inversion times (14 compared to the challenge’s 4) to cover the entire range of T1 values of the phantom. Secondly, Bane et al. used a single traveling phantom for all sites, whereas the challenge used a total of 8 different phantoms (some were shared amongst people who participated independently). Thirdly, Bane et al. averaged the signals within each ROI of each sphere prior to fitting for the T1 values, whereas the challenge pipeline fits the T1 values in a per-voxel basis and only subsequently calculates the mean/median/std. They also only acquired magnitude data, in contrast to the challenge where participants were encouraged to submit both complex and magnitude-only data. Lastly, the implementations of the common inversion recovery protocols were fully standardized (full protocol) across all the platforms (except for two exceptions where one manufacturer couldn’t achieve the lowest TI) and imposed and coordinated by the principal researchers. In contrast, the challenge sought to explore the variations that would occur for a less-restricted protocol (Table 2) that is independently-implemented at multiple centers, which more closely emulates the quantitative MR research flow (publication of a technique and protocol → independently implement the pulse sequence and/or protocol → use the new implementation independently in a study → publish). Of note, in the challenge, one participating group submitted large multicenter datasets they coordinated themselves which emulates the question studied by (Bane et al. 2018) by imaging a single phantom across at 7 different imaging sites, albeit doing so on a single manufacturer. Using this subset, the mean cross-site CoV was 2.9 % (range: 1.6 - 4.9 %) for the first five spheres, which is in agreement with the range of observations for all spheres by (Bane et al. 2018) at 3T using their full inversion recovery protocol (CoV = 5.46 %; range: 0.99 - 14.6 %).

Another study, (Keenan et al. 2021), also investigated the accuracy of T1 mapping techniques using a single ISMRM/NIST system phantom at multiple sites and on multiple platforms. Like (Bane et al. 2018), they used an inversion recovery imaging protocol optimized for the full range of T1 values represented in the ISMRM/NIST phantom, which consisted of 9 to 10 inversion times and a TR of 4500 ms (TR ~ 5T1 of WM). They reported no consistent pattern of differences in measured inversion recovery T1 values across MRI vendors for the two T1 mapping techniques they used (inversion recovery and VFA). They observed relative errors between their T1 measurements and the reference values of the phantom to be below 10% for all T1 values and highest at the lowest and highest T1 values of the phantom.

4.3     |     Lessons Learned and Future Directions#

There are some important things to note about this challenge. Firstly, we’d like to highlight that the submissions for this challenge were due in March 2020, which was at the height of the early days of COVID-19 pandemic lockdowns. Despite this, many participants were able to submit their datasets and continued to work on the challenge, so we’d like to extend our gratitude to them. The lockdowns did pose some challenges, as some groups intended on acquiring more data, and others intended on submitting rescan data that they could no longer do.

This reproducibility challenge did not aim to determine a winning submission based on the most accurate measurements using the protocol. Instead, it aimed to compare differences between independently-implemented protocols. Crowning a winner was not feasible due to concerns that participants may have micro-optimized their inversion recovery protocols, leading to a broader range of differences in implementations across MRI vendors. Reflecting on this issue, one solution could have been to ask participants to acquire two datasets using the phantom – the original inversion recovery protocol by (Barral et al. 2010), and a second T1 mapping measurement using the rapid technique of their choice.The winner would have been the submission whose rapid T1 map closely matched the original inversion recovery protocol, while also being acquired in a clinically feasible time. This approach could help future qMRI reproducibility challenges in the future.

5     |     CONCLUSION#

The 2020 Reproducibility Challenge, jointly organized by the Reproducible Research and Quantitative MR ISMRM study groups, led to the creation of a large open database of standard quantitative MR phantom and human brain inversion recovery T1 maps. These maps were measured using independently implemented imaging protocols on three different scanner manufacturers. All collected data, processing pipeline code, computational environment files, and analysis scripts were shared with the goal of promoting reproducible research practices, and an interactive dashboard was developed to broaden the accessibility and engagement of the resulting datasets (https://rrsg2020.dashboards.neurolibre.org). The differences in stability between independently implemented (inter-participant) and centrally shared (intra-participant) protocols observed both in phantoms and in vivo could help inform future meta-analyses of quantitative MRI metrics (Mancini et al. 2020; Lazari and Lipp 2021) and better guide multi-center collaborations.

ACKNOWLEDGEMENT#

The conception of this collaborative reproducibility challenge originated from discussions with experts, including Paul Tofts, Joëlle Barral, and Ilana Leppert, who provided valuable insights. Additionally, Kathryn Keenan, Zydrunas Gimbutas, and Andrew Dienstfrey from NIST provided their code to generate the ROI template for the NIST phantom. Dylan Roskams-Edris and Gabriel Pelletier from the Tanenbaum Open Science Institute (TOSI) offered valuable insights and guidance related to data ethics and data sharing in the context of this international multi-center conference challenge. The 2020 RRSG study group committee members who launched the challenge, Martin Uecker, Florian Knoll, Nikola Stikov, Maria Eugenia Caligiuri, and Daniel Gallichan, as well as the 2020 qMRSG committee members, Kathryn Keenan, Diego Hernando, Xavier Golay, Annie Yuxin Zhang, and Jeff Gunter, also played an essential role in making this challenge possible. Finally, we extend our thanks to all the volunteers and individuals who helped with the scanning at each imaging site.

DATA AVAILABILITY STATEMENT#

All imaging data submitted to the challenge, dataset details, registered ROI maps, and processed T1 maps are hosted on OSF https://osf.io/ywc9g/. The dataset submissions and quality assurance were handled through GitHub issues in this repository https://github.com/rrsg2020/data_submission. Note that accepted submissions are closed issues, and that the GitHub branches associated with the issue numbers contain the Dockerfile and Jupyter Notebook scripts that reproduce these preliminary quality assurance results and can be run in a browser using MyBinder. The ROI registration scripts for the phantoms and T1 fitting pipeline to process all datasets are hosted in this GitHub repository, https://github.com/rrsg2020/t1_fitting_pipeline. All the analyses of the datasets were done using Jupyter Notebooks and are available in this repository, https://github.com/rrsg2020/analysis, which also contains a Dockerfile to reproduce the environment using a tool like MyBinder. A dashboard was developed to explore the datasets information and results in a browser, which is accessible here, https://rrsg2020.dashboards.neurolibre.org, and the code is also available on GitHub: https://github.com/rrsg2020/rrsg2020-dashboard.

REFERENCES#

Avants, Brian B., Nick Tustison, and Gang Song. 2009. “Advanced Normalization Tools (ANTS).” The Insight Journal 2 (365): 1–35.

Bane, Octavia, Stefanie J. Hectors, Mathilde Wagner, Lori L. Arlinghaus, Madhava P. Aryal, Yue Cao, Thomas L. Chenevert, et al. 2018. “Accuracy, Repeatability, and Interplatform Reproducibility of T1 Quantification Methods Used for DCE-MRI: Results from a Multicenter Phantom Study.” Magn. Reson. Med. 79 (5): 2564–75.

Barral, Joëlle K., Erik Gudmundson, Nikola Stikov, Maryam Etezadi-Amoli, Petre Stoica, and Dwight G. Nishimura. 2010. “A Robust Methodology for in Vivo T1 Mapping.” Magn. Reson. Med. 64 (4): 1057–67. Beg, Taka, Kluyver, Konovalov, Ragan-Kelley, Thiery, and Fangohr. 2021. “Using Jupyter for Reproducible Scientific Workflows.” https://www.computer.org › Csdl › Magazine › 2021/02https://www.computer.org › Csdl › Magazine › 2021/02 23 (March): 36–46.

Boettiger, Carl. 2015. “An Introduction to Docker for Reproducible Research.” ACM SIGOPS Operating Systems Review 49 (1): 71–79.

Bottomley, P. A., T. H. Foster, R. E. Argersinger, and L. M. Pfeifer. 1984. “A Review of Normal Tissue Hydrogen NMR Relaxation Times and Relaxation Mechanisms from 1-100 MHz: Dependence on Tissue Type, NMR Frequency, Temperature, Species, Excision, and Age.” Medical Physics 11 (4): 425–48.

Cabana, Jean-François, Ye Gu, Mathieu Boudreau, Ives R. Levesque, Yaaseen Atchia, John G. Sled, Sridar Narayanan, et al. 2015. “Quantitative Magnetization Transfer Imagingmadeeasy with qMTLab: Software for Data Simulation, Analysis, and Visualization.” Concepts in Magnetic Resonance. Part A, Bridging Education and Research 44A (5): 263–77.

Captur, Gabriella, Peter Gatehouse, Kathryn E. Keenan, Friso G. Heslinga, Ruediger Bruehl, Marcel Prothmann, Martin J. Graves, et al. 2016. “A Medical Device-Grade T1 and ECV Phantom for Global T1 Mapping Quality Assurance-the T1 Mapping and ECV Standardization in Cardiovascular Magnetic Resonance (T1MES) Program.” Journal of Cardiovascular Magnetic Resonance: Official Journal of the Society for Cardiovascular Magnetic Resonance 18 (1): 58.

Cheng, Hai-Ling Margaret, and Graham A. Wright. 2006. “Rapid High-resolutionT1 Mapping by Variable Flip Angles: Accurate and Precise Measurements in the Presence of Radiofrequency Field Inhomogeneity.” Magnetic Resonance in Medicine. https://doi.org/10.1002/mrm.20791.

Damadian, R. 1971. “Tumor Detection by Nuclear Magnetic Resonance.” Science 171 (3976): 1151–53.

Deoni, Sean C. L., Brian K. Rutt, and Terry M. Peters. 2003. “Rapid combinedT1 andT2 Mapping Using Gradient Recalled Acquisition in the Steady State.” Magnetic Resonance in Medicine. https://doi.org/10.1002/mrm.10407.

Dieringer, Matthias A., Michael Deimling, Davide Santoro, Jens Wuerfel, Vince I. Madai, Jan Sobesky, Florian von Knobelsdorff-Brenkenhoff, Jeanette Schulz-Menger, and Thoralf Niendorf. 2014. “Rapid Parametric Mapping of the Longitudinal Relaxation Time T1 Using Two-Dimensional Variable Flip Angle Magnetic Resonance Imaging at 1.5 Tesla, 3 Tesla, and 7 Tesla.” PloS One 9 (3): e91318.

Drain, L. E. 1949. “A Direct Method of Measuring Nuclear Spin-Lattice Relaxation Times.” Proceedings of the Physical Society. Section A 62 (5): 301.

Ernst, R. R., and W. A. Anderson. 1966. “Application of Fourier Transform Spectroscopy to Magnetic Resonance.” The Review of Scientific Instruments 37 (1): 93–102.

Fram, E. K., R. J. Herfkens, G. A. Johnson, G. H. Glover, J. P. Karis, A. Shimakawa, T. G. Perkins, and N. J. Pelc. 1987. “Rapid Calculation of T1 Using Variable Flip Angle Gradient Refocused Imaging.” Magnetic Resonance Imaging 5 (3): 201–8.

Fryback, D. G., and J. R. Thornbury. 1991. “The Efficacy of Diagnostic Imaging.” Medical Decision Making: An International Journal of the Society for Medical Decision Making 11 (2): 88–94. Hahn, Erwin L. 1949. “An Accurate Nuclear Magnetic Resonance Method for Measuring Spin-Lattice Relaxation Times.” Physical Review. https://doi.org/10.1103/physrev.76.145.

Karakuzu, Agah, Mathieu Boudreau, Tanguy Duval, Tommy Boshkovski, Ilana Leppert, Jean-François Cabana, Ian Gagnon, et al. 2020. “qMRLab: Quantitative MRI Analysis, under One Umbrella.” Journal of Open Source Software 5 (53): 2343.

Karakuzu, Agah, Elizabeth DuPre, Loic Tetrel, Patrick Bermudez, Mathieu Boudreau, Mary Chin, Jean-Baptiste Poline, Samir Das, Pierre Bellec, and Nikola Stikov. 2022. “NeuroLibre : A Preprint Server for Full-Fledged Reproducible Neuroscience.” https://doi.org/10.31219/osf.io/h89js.

Keenan, Kathryn E., Maureen Ainslie, Alex J. Barker, Michael A. Boss, Kim M. Cecil, Cecil Charles, Thomas L. Chenevert, et al. 2018. “Quantitative Magnetic Resonance Imaging Phantoms: A Review and the Need for a System Phantom.” Magnetic Resonance in Medicine: Official Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine 79 (1): 48–61.

Keenan, Kathryn E., Joshua R. Biller, Jana G. Delfino, Michael A. Boss, Mark D. Does, Jeffrey L. Evelhoch, Mark A. Griswold, et al. 2019. “Recommendations towards Standards for Quantitative MRI (qMRI) and Outstanding Needs.” Journal of Magnetic Resonance Imaging: JMRI 49 (7): e26–39.

Keenan, Kathryn E., Zydrunas Gimbutas, Andrew Dienstfrey, Karl F. Stupic, Michael A. Boss, Stephen E. Russek, Thomas L. Chenevert, et al. 2021. “Multi-Site, Multi-Platform Comparison of MRI T1 Measurement Using the System Phantom.” PloS One 16 (6): e0252966.

Kluyver, Thomas, Benjamin Ragan-Kelley, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, et al. 2016. “Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows.” In Positioning and Power in Academic Publishing: Players, Agents and Agendas, 87–90. Amsterdam, NY: IOS Press.

Lazari, Alberto, and Ilona Lipp. 2021. “Can MRI Measure Myelin? Systematic Review, Qualitative Assessment, and Meta-Analysis of Studies Validating Microstructural Imaging with Myelin Histology.” NeuroImage 230 (April): 117744.

Mancini, Matteo, Agah Karakuzu, Julien Cohen-Adad, Mara Cercignani, Thomas E. Nichols, and Nikola Stikov. 2020. “An Interactive Meta-Analysis of MRI Biomarkers of Myelin.” eLife 9 (October). https://doi.org/10.7554/eLife.61523.

Marques, José P., Tobias Kober, Gunnar Krueger, Wietske van der Zwaag, Pierre-François Van de Moortele, and Rolf Gruetter. 2010. “MP2RAGE, a Self Bias-Field Corrected Sequence for Improved Segmentation and T1-Mapping at High Field.” NeuroImage. https://doi.org/10.1016/j.neuroimage.2009.10.002.

McCarthy, Paul. 2019. FSLeyes. https://doi.org/10.5281/zenodo.3403671.

Merkel, Dirk. 2014. “Docker: Lightweight Linux Containers for Consistent Development and Deployment.” 2014. https://www.seltzer.com/margo/teaching/CS508.19/papers/merkel14.pdf.

Plotly Technologies Inc. 2015. Collaborative Data Science. https://plot.ly.

Project Jupyter, Matthias Bussonnier, Jessica Forde, Jeremy Freeman, Brian Granger, Tim Head, Chris Holdgraf, et al. 2018. “Binder 2.0 - Reproducible, Interactive, Sharable Environments for Science at Scale.” In Proceedings of the Python in Science Conference. SciPy. https://doi.org/10.25080/majora-4af1f417-011.

Pykett, I. L., and P. Mansfield. 1978. “A Line Scan Image Study of a Tumorous Rat Leg by NMR.” Physics in Medicine and Biology 23 (5): 961–67.

Redpath, T. W., and F. W. Smith. 1994. “Technical Note: Use of a Double Inversion Recovery Pulse Sequence to Image Selectively Grey or White Brain Matter.” The British Journal of Radiology 67 (804): 1258–63.

Schweitzer, Mark. 2016. “Stages of Technical Efficacy: Journal of Magnetic Resonance Imaging Style.” Journal of Magnetic Resonance Imaging: JMRI 44 (4): 781–82.

Seiberlich, Nicole, Vikas Gulani, Adrienne Campbell, Steven Sourbron, Mariya Ivanova Doneva, Fernando Calamante, and Houchun Harry Hu. 2020. Quantitative Magnetic Resonance Imaging. Academic Press. Sled, J. G., and G. B. Pike. 2001. “Quantitative Imaging of Magnetization Transfer Exchange and Relaxation Properties in Vivo Using MRI.” Magnetic Resonance in Medicine: Official Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine 46 (5): 923–31.

Stikov, Nikola, Mathieu Boudreau, Ives R. Levesque, Christine L. Tardif, Joëlle K. Barral, and G. Bruce Pike. 2015. “On the Accuracy of T1 Mapping: Searching for Common Ground.” Magn. Reson. Med. 73 (2): 514–22.

Stupic, Karl F., Maureen Ainslie, Michael A. Boss, Cecil Charles, Andrew M. Dienstfrey, Jeffrey L. Evelhoch, Paul Finn, et al. 2021. “A Standard System Phantom for Magnetic Resonance Imaging.” Magnetic Resonance in Medicine: Official Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine 86 (3): 1194–1211.

Tofts, P. S. 1997. “Modeling Tracer Kinetics in Dynamic Gd-DTPA MR Imaging.” Journal of Magnetic Resonance Imaging: JMRI 7 (1): 91–101.

Wansapura, J. P., S. K. Holland, R. S. Dunn, and W. S. Ball Jr. 1999. “NMR Relaxation Times in the Human Brain at 3.0 Tesla.” Journal of Magnetic Resonance Imaging: JMRI 9 (4): 531–38.

Yuan, Jing, Steven Kwok Keung Chow, David Ka Wai Yeung, Anil T. Ahuja, and Ann D. King. 2012. “Quantitative Evaluation of Dual-Flip-Angle T1 Mapping on DCE-MRI Kinetic Parameter Estimation in Head and Neck.” Quantitative Imaging in Medicine and Surgery 2 (4): 245–53.

Footnotes#

  1. https://blog.ismrm.org/2019/12/12/reproducibility-challenge-2020-join-the-reproducible-research-and-quantitative-mr-study-groups-in-their-efforts-to-standardize-t1-mapping/

  2. https://collaborate.nist.gov/mriphantoms/bin/view/MriPhantoms/SimpleImagingInstructions

  3. Source: https://qmri.com/cmri-product-resources/#premium-system-resources

  4. This website was provided as a resource to the participants for best practices to obtain informed consent for data sharing https://www.uu.nl/en/research/research-data-management/guides/informed-consent-for-data-sharing

  5. http://www-mrsrl.stanford.edu/~jbarral/t1map.html

  6. https://github.com/rrsg2020/analysis

  7. https://mybinder.org/v2/gh/rrsg2020/analysis/master?filepath=analysis

  8. Dashboard website: https://rrsg2020.dashboards.neurolibre.org/, code repository: https://github.com/rrsg2020/rrsg2020-dashboard

  9. Only T1 maps measured using phantom version 1 were included in this inter-participant COV, as including both sets would have increased the COV due to the differences in reference T1 values. There were seven participants that used version 1, and six that used version 2.